49 research outputs found

    Recent advances in the Self-Referencing Embedding Strings (SELFIES) library

    Get PDF
    String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models. To address these problems, a novel representation, SELF-referencIng Embedded Strings (SELFIES), was proposed that is inherently 100% robust, alongside an accompanying open-source implementation. Since then, we have generalized SELFIES to support a wider range of molecules and semantic constraints and streamlined its underlying grammar. We have implemented this updated representation in subsequent versions of \selfieslib, where we have also made major advances with respect to design, efficiency, and supported features. Hence, we present the current status of \selfieslib (version 2.1.1) in this manuscript.Comment: 11 pages, 2 figure

    Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design

    Full text link
    The efficient exploration of chemical space to design molecules with intended properties enables the accelerated discovery of drugs, materials, and catalysts, and is one of the most important outstanding challenges in chemistry. Encouraged by the recent surge in computer power and artificial intelligence development, many algorithms have been developed to tackle this problem. However, despite the emergence of many new approaches in recent years, comparatively little progress has been made in developing realistic benchmarks that reflect the complexity of molecular design for real-world applications. In this work, we develop a set of practical benchmark tasks relying on physical simulation of molecular systems mimicking real-life molecular design problems for materials, drugs, and chemical reactions. Additionally, we demonstrate the utility and ease of use of our new benchmark set by demonstrating how to compare the performance of several well-established families of algorithms. Surprisingly, we find that model performance can strongly depend on the benchmark domain. We believe that our benchmark suite will help move the field towards more realistic molecular design benchmarks, and move the development of inverse molecular design algorithms closer to designing molecules that solve existing problems in both academia and industry alike.Comment: 29+21 pages, 6+19 figures, 6+2 table

    On scientific understanding with artificial intelligence

    Get PDF
    Imagine an oracle that correctly predicts the outcome of every particle physics experiment, the products of every chemical reaction, or the function of every protein. Such an oracle would revolutionize science and technology as we know them. However, as scientists, we would not be satisfied with the oracle itself. We want more. We want to comprehend how the oracle conceived these predictions. This feat, denoted as scientific understanding, has frequently been recognized as the essential aim of science. Now, the ever-growing power of computers and artificial intelligence poses one ultimate question: How can advanced artificial systems contribute to scientific understanding or achieve it autonomously? We are convinced that this is not a mere technical question but lies at the core of science. Therefore, here we set out to answer where we are and where we can go from here. We first seek advice from the philosophy of science to understand scientific understanding. Then we review the current state of the art, both from literature and by collecting dozens of anecdotes from scientists about how they acquired new conceptual understanding with the help of computers. Those combined insights help us to define three dimensions of android-assisted scientific understanding: The android as a I) computational microscope, II) resource of inspiration and the ultimate, not yet existent III) agent of understanding. For each dimension, we explain new avenues to push beyond the status quo and unleash the full power of artificial intelligence's contribution to the central aim of science. We hope our perspective inspires and focuses research towards androids that get new scientific understanding and ultimately bring us closer to true artificial scientists.Comment: 13 pages, 3 figures, comments welcome

    SELFIES and the future of molecular string representations

    Get PDF
    Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, SMILES, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, SMILES has several shortcomings -- most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100\% robustness: SELFIES (SELF-referencIng Embedded Strings). SELFIES has since simplified and enabled numerous new applications in chemistry. In this manuscript, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete Future Projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science

    SELFIES and the future of molecular string representations

    Get PDF
    Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, SMILES, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, SMILES has several shortcomings -- most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100\% robustness: SELFIES (SELF-referencIng Embedded Strings). SELFIES has since simplified and enabled numerous new applications in chemistry. In this manuscript, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete Future Projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.Comment: 34 pages, 15 figures, comments and suggestions for additional references are welcome

    SELFIES and the future of molecular string representations

    Get PDF
    Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings—most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science

    Mechanistische Untersuchungen von C-H Aktivierungsreaktionen

    No full text
    Abweichender Titel nach Übersetzung der Verfasserin/des VerfassersZusammenfassung in deutscher SpracheDer Ausgangspunkt dieser Masterarbeit war eine Rh(I)-katalysierte direkte Alkylierungsreaktion von benzylischen Aminen am benzylischen Kohlenstoffatom mit Unterstützung einer 3-substituierten Pyridin-2-yl dirigierenden Gruppe unter Verwendung von Alkylbromiden oder Alkenen. Erste mechanistische Studien konnten zeigen, dass beta-H Eliminierung während der Reaktion mit Alkylbromiden schnell ist und daher zwischenzeitlich die entsprechenden Alkene gebildet werden. Ziel dieses Projekts war es die Mechanismen dieser Reaktionen weiter zu untersuchen, wobei der Fokus auf der Reaktion mit Alkenen lag. Als Erstes konnte gezeigt werden, dass die Reaktion mit Alkenen über Imin- Intermediate abläuft und daher über einen C(sp^2)-H Aktivierungsmechanismus. Die Reaktion zeigt einen primären kinetischen Isotopeneffekt von 4.3 an der benzylischen C-H Position zusammen mit einem reversiblen H-D Austausch an derselben Position. Diese Beobachtungen zeigen, dass es zumindest 2 unterschiedliche Reaktionsschritte geben muss, in denen die benzylischen C--H Bindungen gespalten werden. Die Iminintermediate werden unter den Reaktionsbedingungen zu den Endprodukten der Reaktion umgesetzt. Eine Analyse des Zeitprofils des alkylierten Iminintermediates offenbart, dass es zeitlich vor dem finalen alkylierten Amin gebildet wird. Weiters wurde herausgefunden, dass K_2CO_3, das unter den Reaktionsbedingungen quasi unlöslich ist, nur am Beginn der Reaktion mit Alkenen benötigt wird. Während dieser Induktionsperiode löst sich K_2CO_3 zu einem sehr geringen Anteil im Reaktionsgemisch auf und der Rh-Katalysator reagiert irreversibel mit gelöstem K_2CO_3 zur katalytisch aktiven Spezies. Die Dauer dieser Induktionsperiode ist abhängig von der Konzentration, von der spezifischen Oberfläche und vom Wassergehalt von K_2CO_3 und von der Agitation des Reaktionsgemisches. Zuletzt wurde aufgedeckt, dass die Reaktion mit Alkylbromiden einige unerwartete Nebenprodukte bildet. Das alkylierte Hauptprodukt der Reaktion ist von einem Nebenprodukt begleitet, das eine Kohlenstoffkette mit einem Kohlenstoffatom weniger aufweist. Zusätzlich wird auch ein Amid des benzylischen Amins beobachtet, das eine um ein Kohlenstoffatom längere Kette als das eingesetzte Alkylbromid aufweist. Intensive Screenings wurden durchgeführt um die Selektivität in der Reaktion mit Alkylbromiden zu erhöhen und es konnte gezeigt werden, dass der Zusatz von sekundären Alkoholen darin effektiv ist, jedoch nur in einer anfänglichen Reaktionsperiode. Die anfänglich deutlich erhöhte Selektivität bleibt nicht aufrecht, wenn die Reaktion sich vollständigem Umsatz annähert.The starting point of this master thesis was a Rh(I)-catalysed direct alkylation reaction of benzylic amines at the benzylic carbon atom directed by 3-substituted pyridin-2-yls employing either alkylbromides or alkenes. Preliminary mechanistic investigations have previously shown that during the reaction using alkylbromides beta-H elimination is fast and the corresponding alkenes are formed intermediately. Since the reaction conditions are also almost identical, both reactions are likely to proceed via similar reaction mechanisms. The goal of this project was to investigate further into the reaction mechanisms of these reactions, especially focussing on the transformation using alkenes. First of all, the reaction using alkenes, formally a C(sp^3)-H activation, was revealed to proceed via imine intermediates and hence via a C(sp^2)-H activation pathway. The reaction shows a primary KIE of 4.3 at the benzylic C--H position together with a reversible H-D exchange at the same position which indicates that there are at least 2 distinct steps in which the corresponding C--H bonds are broken. The imine intermediates, which are detected throughout the whole reaction period, are shown to be converted to the final product under the reaction conditions and a time course analysis of the alkylated imine intermediate shows that it is formed before the final amine product in the course of the reaction. Second of all, K_2CO_3, which is effectively insoluble in the reaction mixture under the reaction conditions, was shown to be needed only in the beginning of the reaction using alkenes. During this induction period K_2CO_3 dissolves to a very small extent and the Rh-catalyst irreversibly reacts with it to form the catalytically active species. The duration of this induction period is dependent on the concentration, the specific surface and the water content of K_2CO_3 and on the agitation of the reaction mixture and all these dependences can be rationalised on the basis of a detailed kinetic model. Third of all, the reaction using alkylbromides was revealed to yield several unexpected sideproducts. The alkylated product is accommodated by an alkylated sideproduct with a one carbon atom shorter alkyl chain. In addition, one of the sideproducts is an amide of the benzylic amine bearing a carbon chain elongated by one carbon atom with respect to the employed alkylbromide. Intensive screenings were carried out to increase selectivity in the reaction employing alkylbromides and it could be shown that the addition of secondary alcohols proves to be effective in that matter in the beginning while slowing the reaction down significantly as well. However, this selectivity is lost when the reaction approaches full conversion.14

    A General Fitting Function to Estimate Apparent Reaction Orders of Kinetic Profiles

    No full text
    The rapid development of analytical methods in recent decades has resulted in a wide range of readily available and accurate reaction-monitoring techniques, which allow for easy determination of high-quality concentration-time data of chemical reactions. However, while the acquisition of kinetic data has become routine in the development of new chemical reactions and the study of their mechanisms, not all the information contained therein is utilized because of a lack of suitable analysis tools which unnecessarily complicates mechanistic studies. Herein, we report on a general method to analyze a single concentration-time profile of chemical reactions and extract information regarding the reaction order with respect to substrates, the presence of multiple kinetic regimes, and the presence of kinetic complexities, such as catalyst deactivation, product inhibition, and substrate decomposition.<br /

    London Dispersion in Molecular Systems

    No full text

    Origin of the Immiscibility of Alkanes and Perfluoroalkanes

    No full text
    ISSN:0002-7863ISSN:1520-512
    corecore